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ABSTRACT 


THE Nl^MERICAL EVALUATION OF THE MAXIMUM- 
LIKELIHOOD ESTIMATE OF A SUBSET OF 
MIXTURE PROPORTIONS 

In this note, we give necessary and sufficient conditions for a maximum- 
likelihood estimate of a subset of mixture proportions. From these conditions, 
we derive likelihood equations satisfied by the maximum-likelihood estimate 
and discuss a successive-approximations procedure suggested by these equations 
for numerically evaluating the maximum-likelihood estimate. It is shown that, 
with' probability 1 for large samples, this procedure converges locally to the 
maximum-likelihood estimate whenever a certain step-size lies between 0 and 2. 
Furthermore, optimal rates of local convergence are obtained for a step-size 
which is bounded below by a number between 1 and 2. 


The Numerical Evaluation of the Maximum-Likelihood 
Estimate of a Subset of Mixture Proportions 

by 

B. Charles Peters, Jr. 

NASA/Matlonal Research Council Research Associate 
Earth Observations Division, Johnson Space Center 

and 

*Homer F. Walker 
Department of Mathematics 
University of Houston 
Houston, Texas 

1. Introduction . 

Let X be an n-dimenslonal random variable whose density function Is a 
convex combination of density functions p^.p^^, . . . ,p^ on In particular, 

suppose that the density function of x Is p(x,^), a member of the para- 
metric family of density functions 

p(x,a) » + (1 - B)p^(x) 

for X eP", where a ■ (a^,...,a^)^ and B satisfy the following con- 

*Thls author was supported in part by NASA under Contract JSC-NAS-9-12777. 



strainLs: (i> 0 < 3 ^ 1 and 0 < £ 1 for i ■ (11) ** 

In this note, we assume that 3 and the density functions P()»***»P||, 
known, and we address the problem of numerically estimating a , the vector 
of unknown mixture proportions, on the basis of a given sample , jj 

of Independent observations on x. 

To be more specific, we define a maximum- likelihood estimate of a • 
based on the given sample, to be a choice of Ct which satisfies the constraints 
(i) and (ii) above and which maximizes the log-likelihood function 


log p(xj^,a) 


(We assume throughout this report that p(x^,a) ^ 0 for k ■ l,«**tlf and for 

all Cl satisfying the given constraints.) In the following, we derive necessary 

— 

and sufficient conditions for a to be a maximum-likelihood estimate or a » 
and we discuss a particular Iterative procedure based on these conditions 
for the numerical evaluation of a maximum-likelihood estimate. 

The results given here generalize those of [2], in t^lch a restrict^ 
iterative procedure is considered in the special case 3 1. We also remark 

that our results apply to the problem of numerically evaluating a maximum- 
likelihood estimate of a proper subset {ci^}j i of mixture proportions 

X x^Xp* • • pni 

8 o 

in a density p “ iii^iPi when the remaining s-m proportions are known* 

Indeed, this problem is seen to be of the type considered here by taking 

^ ^ ' l*|fl“i Pq * 1^ i=m+l“iPi’ 



2. The likelihood equations. 


One easily verifies that the log-llkellhood function L Is a concave 

function of <x on the constraint set, i.e. , the set of elements of IR™ 

satisfying the constraints (1) and (11) given In the introduction. It 

follows that a necessary and sufficient condition for a to be a maxlmum- 

llkellhood estimate of is that 7L(a)(a' - a) s 0 for all a' In the 

constraint set, where VL(a) ■ (gj^(o) , . . . , Since this inequality 

“l ® in 

holds If and only if it holds whenever a’ is an extreme point of the constraint 

set, one concludes that a Is a maximum-likelihood estimate If and only If, 

3L ~ ■*— — 

for 1 « l,...,m, S ^ VL(a)a, with equality if > 0. We reformulate 

this result as the following necessary and sufficient condition for a to be 
a maximum-likelihood estimate of a : For 1 “ l,...,m. 


( 1 ) 


N 









f 


with equality if > 0. 

Multiplying both sides of (1) by and rearranging gives the following 

necessary condition for a to be a maximum-likelihood estimate: 


( 2 ) 


a. 




P(x^,a) / >‘-1 p (^. a ) 


for 1 « 1 m. This condition is not sufficient in general for a to be a 

maximum-likelihood estimate. Indeed, this condition is satisfied by each extreme 
point of the constraint set. However, this condition is sufficient as well as 
necessary for a to be a maximum- likelihood estimate which lies in the interior 
of the constraint set, i.e., the components of which satisfy > 0 for 
i'»l,...,m. We refer to the equations (2) as the likelihood equations . 
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3* The Iterative procedure . 

We now define an Iterative procedure based on the likelihood equations and 

discuss Its applicability to the problem of numerically evaluating a maxlmum- 

llkellhood estimate of ^ . Setting A(oi) ■ (A (a),...,A (o))*^, we write the 

1 ® 

likelihood equations as 

(3) o - A(a) . 

Equivalent to (3) is the equation 


o • (a) = (1 - e)a + c A(a) 


for any number e. (Of course, (4) becomes (3) when e ■ 1.) Note that 

the continuous nonlinear operator A maps the constraint set Into Itself. For 

any e and any a In the constraint set, the components of $^(a) sum to 1; 

however, the components of $^(a) are guaranteed to be non-negative for all a 
in the constraint set only If 0 s c 5 1. 

The Iterative procedure suggested by (4) Is the following: Beginning 

with some starting value In the constraint set, define successive Iterates 

inductively by 




for j « 1,2,... . We observe that if the sequence of iterates defined by (5) 
converges, then its limit is a fixed point of and, hence, of A. Our first 

theorem gives sufficient conditions for such a limit to be a maximum-likelihood 


estimate. The proof of the theorem is virtually the same as that of the 


I 


i 
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corresponding theorem In [2], and we omit It. 

THEOREM 1 ; Suppose that lies In the Interior of the constraint set and 

that 0 < € S 1. If the sequence of Iterates defined by (5) converges, then 

its limit Is a maxlmum-llkellhood estimate of a . 

In order to give sufficient conditions for the convergence of the Iterates 

defined by (5) , we need to make further assumptions concerning the density 

functions p ,...,p . Henceforth, we assume that they are linearly Independent, 
o m 

m m2 

i.e. , that any linear combination with c^ ^ 0, does not vanish 

Identically on This Insures that, with probability 1, there exists a 

unique maximum-likelihood estimate for large N which converges to a as M 
approaches Infinity. (See, for example. Appendix 1 of [3].) Our aim Is to 
establish the following result. 

THEOREM 2 ; Suppose that lies in the interior of the constraint set and 

that 0 < e < 2. Then with probability 1 as N approaches infinity, 
is a local contraction on the constraint set near a, the (unique) maximum- 
likelihood estimate of 0°. If the density functions Pq***»iP^u analytic 

as well as linearly Independent, then *ls a local contraction on the con- 
straint set near a with probability 1 whenever a lies in the interior of 
the constraint set and N s m. 

In scylng that is a local contraction on the constraint set near a, 

ve mean that there exists a norm ] I j j on ” and a constant X, 0 S X < 1, 
such that 


( 6 ) 


l$^(a') - aj 1 s X I ja’ - a| 


for all o' in the constraint set which lie sufficiently near a. Our sufficient 
conditions for the convergence of the iterates defined by (S) are stated in 
the corollary below, which is an lomediate consequence of Theorem 2 and the 
inequality (6) . 

COROLLARY ; Suppose that lies in the interior of the constraint set and 

that 0 < e < 2. Then with probability 1 as N approaches infinity, the 

Iterates defined by (5) converge to a, the (unique) maximum-likelihood 

estimate of whenever lies sufficiently near a. If the density 

functions p ,...,p are analytic as well as linearly Independent, then, with 
o m 

probability 1 whenever a lies in the interior of the constraint set and 
• _ —( 1 ) 

N ^ m, the iterates converge to (X whenever a lies sufficiently near 

a. 

Proof of Theorem 2 ; In proving the first statement of the theorem, it may be 
assumed that the (unique) maximum-likelihood estimate a lies in the interior 
of the constraint set. (By the remarks preceding the theorem, the probability 
is 1 that this occurs for large N.) Assuming 0 < e < 2, we must show that, 
with probability 1 as N approaches Infinity, an inequality of the form (6) 
holds. 

For any norm on [R , one can write 

♦^(a') - a - 7<t^(a)[a' - a] + 0(j |a' - oi| |^). 


In this expression, (a) denotes the m><m matrix whose Ij— entry is 
th 3 

the i — component of ®^(a). It follows that the first statement of the 

theorem will be proved if it can be shown that, with probability 1 as N 
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approaches infinity, there exists a norm || |j on |JJ ® and a number X, 
0 £ X < 1, for which an inequality of the form 


m l|v»^(c.)-r|| s MIyII 


holds for all Y subspace 


5 - {Y - (Yj^, 




• 0> £ [R 


being the fact that a satisfies the likelihood equations (2), one 


verifies that V4>^(a) » I - cQ, where 


Vi<v 


m 
I 

i-1 k=l p(Xj^.a) 


(diag a^) 


N 

I 

k-1 


pCXj^.O) i 




p(xj^.a) 


+ (1-3) 




p(jCj^.o) 




\ 



\p(x^.a) / 


It is easily shown that £ is invariant under Q and, hence, under V<t^(a). 

To establish an Inequality of the form (7), it suffices to show that, with 

probability 1 as N approaches infinity, there exists a norm on ^ with 

respect to which the operator norm of VC>^(o) is less than 1. 

M — —ir _1 

Define an inner product <•,'> on C by <Y»Y*^ • Y (diag Ot^ )y’ for 
Y and y' in £ . With respect to this inner product, Q is symmetric (in 
fact, positive semi-definite) on . It follows that, with respect to the 


f 


f 
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norm on £ defined by this inner product, the operator norm of V^^(a) on ^ i» 
less than 1 if the eigenvalues of Q which correspond to eigenvectors in ^ 
lie in the interval (0,1]. 

Now Q is column stochastic; hence, no eigenvalue of Q on is greater 
than 1. (See [1] for a discussion of column stochastic matrices.) To com- 
plete the proof of the first statement of the theorem, it need only be shown 
that, with probability 1 as N approaches infinity, Q is positive-definite 
on £ . Now, with probability 1, a converges to ^ as N approaches 
infinity. Using arguments analogous to those employed in [3], one verifies 
that, with probability 1, Q converges as N approaches infinity to 


a 




/ \ 


^(dlag n°){ J 

S 

f p(x,^0) 

s 

* 

1 i 

1 : PqCx) 

+ (1-8) -2__ 

p(x,o®) 

iv 

P(x,a®)' 

# 

1 # 

« 
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• 

ip(x.a®)/ 



^p(x,a®)/ 


p(x,a‘*)dx}. 


a positive-definite operator on H . It follows that Q is positive-definite 
on ^ with probability 1 as N approaches infinity, and the first state- 
ment of the theorem is proved. 

To prove the second statement of the theorem, suppose that N 2 m, that 
Cl lies in the interior of the constraint set, and that analytic 

as well as linearly independent. Repeating the above argument with only minor 
changes, one obtains the desired result by finally observing that, as a con- 
sequence of the le m ma in Appendix 2 of [3], Q is positive-definite on ^ 
witli probability 1 whenever N S m. This completes the proof of the theorem. 
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4. The optimal c . 

The corollary of Theorem 2 may be aunmarlzed by saying thet, if a lies 
In the interior of the constraint sec, then, with probability 1 for large 
samples, the Iterates defined by (5) converge locally to the maximum-likelihood 
estimate a whenever 0 < e < 2. Thus the Iterative procedure (5), which is 
a generalized steepest-ascent (deflected-gradient) method, has the particularly 
Important property of converging locally to oi whenever the step-size e lies 
in an interval which is completely Independent of the particular mixture problem 
at hand. Furthermore, if £ is no greater than 1, then the successive 
iterates defined by (5) are guaranteed to remain in the constraint set. It 
is readily ascertained that these properties are not shared by the usual steepest- 
ascent procedure, given by 


1 



1 •• 

^ 4 v.h 




, m N 

— T. T 

mN j-1 k-1 




1 


for i ■ 1, . . . ,m. 

We now observe that there exists a particular value of e, referred to 
as "the optimal e," which yields, with probability 1 for large samples, 

Che fastest uniform rate of local convergence of (5) near a. Indeed, suppose 
that o is an interior point of the constraint set and that V$^(a) is 
positive-definite on ^ , (Recall that, with probability 1, these assumptions 
are valid for large samples.) Then one sees from the proof of Theorem 2 that 
the optIni.al f Is the uni(|ue value of c which minimizes the spectral radius 
of (a) = I - <Q, regarded a-; an operator on / . (V<^ (a) Is syiunetri.- on 

£. with respect to the inner product <* ,*> defined previously. Consequently, 
its operator norm with respect to this inner product Is equal to its spectral 


I 


I 


I 
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radius and, hence, nlnlfflal.) It Is easily verified that the optimal e Is 

2 

given by 1 -- €T - ep - 1, l.e., e “ . where p and X are, respectively, 

the largest and smallest eigenvalues of the operator Q restricted to E • 

It follows from the proof of Theorem 2 that p Is never greater than 1. 

Thus the optimal c is bounded below by where T lies between 0 and 

1. In particular, this lower bound on the optimal t lies between 1 and 2. 

It should be noted that, if p Is strictly less than 1, then the optimal c 

is actually greater than 2, even though Theorem 2 falls to guarantee the 
local convergence of (5) for such values of £. We also observe that, despite 
the ffct tf.at the column-stochastic matrix Q always has 1 as an eigenvalue, 

the eigenvalue p of the restricted operator Q on £ can be arbitrarily 

small (and, hence, the optimal c can be arbitrarily large). Indeed, Q Is 
nearly the zero operator on If the component populations in the mixture are 
nearly Identical. 

Suppose that the component populations in the ml):ture are "widely separated'* 
in the sense that, tor 1 j. 


?i(xp P,(X^) ^ 

- 2 ^ 

for k • 1,...,N. Then Q = 1 and, hence, p and T must lie near 1. One 
concludes that, with probability 1 for large samples, the fastest uniform 
rate of local convergence of (5) Is obtained for c near 1, anc for the 
optimal f, = I - tQ - 0. Thus for mixtures whose component populations 

are widely separated, the optimal c is only slightly greater than 1, and 
rapid first-order local convergence of (5) to a can be expected for this c. 
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Now suppose that two or more of the component populations in the mixture 
are nearly identical in the sense that, for some pair of distinct, non-zero 
indices i and j, ~ ^ ~ 1,...,N. Then Q is nearly 

singular, and hence, T is near zero. Consequently, the optimal e cannot 
be much smaller than 2. We remark that, if p is near 1 in this case, then 
the optimal e must lie near 2. Then the spectral radius of V$^(a) on C- 
is near 1, even for the optimal e, and it follows that slow first-order 
local convergence of (5) to a can be expected in this case. 
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